{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# $ R \\times C $ 列联表资料的 $ \\chi^2 $ 检验中多个样本之间的多重比较\n",
    "\n",
    "## 案例\n",
    "\n",
    "在B_09_6的基础上，探究三种治疗方法方案两两比较，治疗疾病的有效率有无区别\n",
    "\n",
    "### 数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><style>\n",
       ".dataframe > thead > tr,\n",
       ".dataframe > tbody > tr {\n",
       "  text-align: right;\n",
       "  white-space: pre-wrap;\n",
       "}\n",
       "</style>\n",
       "<small>shape: (3, 3)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>group</th><th>positive</th><th>negative</th></tr><tr><td>str</td><td>i64</td><td>i64</td></tr></thead><tbody><tr><td>&quot;modern&quot;</td><td>51</td><td>49</td></tr><tr><td>&quot;ancient&quot;</td><td>35</td><td>45</td></tr><tr><td>&quot;combined&quot;</td><td>59</td><td>15</td></tr></tbody></table></div>"
      ],
      "text/plain": [
       "shape: (3, 3)\n",
       "┌──────────┬──────────┬──────────┐\n",
       "│ group    ┆ positive ┆ negative │\n",
       "│ ---      ┆ ---      ┆ ---      │\n",
       "│ str      ┆ i64      ┆ i64      │\n",
       "╞══════════╪══════════╪══════════╡\n",
       "│ modern   ┆ 51       ┆ 49       │\n",
       "│ ancient  ┆ 35       ┆ 45       │\n",
       "│ combined ┆ 59       ┆ 15       │\n",
       "└──────────┴──────────┴──────────┘"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import polars as pl\n",
    "\n",
    "df = pl.read_csv(\"B_09_6.csv\")\n",
    "\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 分析\n",
    "\n",
    "此案例为 $ 3 \\times 2 $ 联表。对检验水平进行矫正。\n",
    "\n",
    "$ \\alpha' = \\frac{\\alpha}{\\left( \\begin{array}{c} k \\\\ 2 \\end{array} \\right)} $\n",
    "\n",
    "也可以保持 $ \\alpha $ 不变，对 $ p $ 值进行校正。\n",
    "\n",
    "### 假设\n",
    "\n",
    "$ H_0 $: 任意两对比组的总体有效率想等  \n",
    "$ H_1 $: 存在某两对比组的总体有效率不相等\n",
    "\n",
    "> 注：此处课本的 $ H_1 $ 假设为：任意两对比组的总体有效率不相等。这明显是错误的，两个假设不互斥。教材例 9-7 的检验假设是正确的。详情见[关于 p 值的个人理解](../个人心得/关于%20p%20值.md)\n",
    "\n",
    "### 检验"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "combination = ('modern', 'ancient')\n",
      "chi square value = 0.6682644730331515\n",
      "p value = 0.4136573458189974\n",
      "significent: False\n",
      "\n",
      "combination = ('modern', 'combined')\n",
      "chi square value = 13.886069064803442\n",
      "p value = 0.00019423282997657582\n",
      "significent: True\n",
      "\n",
      "combination = ('ancient', 'combined')\n",
      "chi square value = 19.440069304677017\n",
      "p value = 1.0380617158618243e-05\n",
      "significent: True\n",
      "\n",
      "All significant: True\n"
     ]
    }
   ],
   "source": [
    "from scipy.stats import chi2_contingency\n",
    "from itertools import combinations\n",
    "\n",
    "combines = list(combinations(df.select(\"group\").to_series().to_list(), 2))\n",
    "\n",
    "corrected_alpha = 0.05 / len(combines) # according to Bonferroni correction\n",
    "\n",
    "result = [[float(res.statistic), float(res.pvalue)] for res in map(chi2_contingency, [df.filter(pl.col(\"group\").is_in(i)).select(\"positive\",\"negative\") for i in combines])]\n",
    "\n",
    "result: list[tuple[str, str], list[float], bool] = list(zip(combines, result, [p < corrected_alpha for p in map(lambda x: x[1], result)]))\n",
    "\n",
    "print(f\"\".join(map(\n",
    "    lambda x: f\"\"\"\n",
    "combination = {x[0]}\n",
    "chi square value = {x[1][0]}\n",
    "p value = {x[1][1]}\n",
    "significent: {x[2]}\n",
    "\"\"\", result\n",
    ")))\n",
    "print(f\"All significant: {any(map(lambda x: x[2], result))}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$ \\exists \\ p_i,\\ i \\in [1,3],\\ p_i \\leqslant \\alpha' $，因而拒绝原假设，认为西药组和中西药结合组、中药组和中西药结合组的总体有效率不相等。\n",
    "\n",
    "> 注：课本在计算时没有采用 Yates' correction for continuity，因而 $ \\chi^2 $ 结果较本方案较大。  \n",
    "> 在 `chi2_contingency` 函数中，指定 `correction=False` 即可得到与课本一致的计算结果。"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
